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AMENDMENT TO THE CLAIMS 
1. (currently amended) A computer readable storage media storing instructions readable by a 
computer which, when implemented, cause the computer to perform a method comprising: 

with a processor; 

segmenting a sentence of Chinese characters into constituent Chinese words 
having one or more Chinese characters by performing a Pu rv r< 
Maximum Matching (FMM) segmentatio n of the input sentence and a 
Backward Maximum Matching (BMM) segmentation of the input 
sentence; 

keny is ih sent< i , , yl ch; acjvi if<> knov \ .1 iAsis . id ,!t l-.-.v-i ••no 
ovetlapp j 1 i s string 

gee ognizmg an overlapping ambiguity string irs the segm ers t ed s eB^ee, wherein 
the overlapping ambiguity string comprises at least three Chinese 
characters having at least two possible segmentations, wherein each 
possible segmentation comprises a right portion and a left portion and 
wherein the left portion and the right portion remain in the tokenized 
corpus and the at least one overlapping ambiguity strin g is removed from 
the tokenized corpus ; 

obtaining probability information relating to context for each possible 

segmentation, wherein the probability information is based on at least one 
context feature adjacent &e~averlapping ambiguity string -aa4-one of the 
right portion or left portion of the possible segmentation, and wherein the 
at least one context feature comprises a Chinese character; and 

outputting an indication for selecting one of the at least two possible 
segmentations as a function of the obtained probability information. 



2. (currently amended) The computer readable storage media of claim 1, wherein obtaining the 
probability information comprises obtaining probability information from a language model. 
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3. (Previously presented) The computer readable storage media of claim 2 wherein the language 
model comprises a trigram model. 

4. (previously presented) The computer readable storage media of claim 2 wherein outputting 
an indication for selecting one of the at least two possible segmentations comprises 
classifying the probability information. 

5. (Previously presented) The computer readable storage media of claim 4 wherein classifying 
comprises classifying using Naive Bayesian Classification. 

6. (Canceled) 

7. (Currently amended) The computer readable storage media of claim 6-l_wherein recognizing 
the overlapping ambiguity string comprises recognizing a possible segmentation O f of the 
overlapping ambiguity string from the FMM segmentation and a possible segmentation O b 
of the overlapping ambiguity string from the BMM segmentation. 

8. (currently amended) The computer readable storage media of claim 7, wherein outputting 
the indication comprises selecting one of the at least two possible segmentations as a function of 
a set of context features surrounding the overlapping ambiguity string. 

9. (previously presented) The computer readable storage media of claim 8 wherein the set of 
context features comprises words or grammatical features surrounding the overlapping ambiguity 
string. 

10. (previously presented) The computer readable storage media of claim 8, wherein outputting 
the indication comprises classifying the probability information of the set of context features and 
Of 



11. (previously presented) The computer readable storage media of claim 8, wherein outputting 
the indication comprises classifying the probability information of the set of context features and 

o b . 

12. (previously presented) The computer readable storage media of claim 8, outputting the 
indication comprises determining which of O f or O b has a higher probability as a function of 
the set of context features. 

13. (cancelled) 

14. (currently amended) A method of segmentation of a sentence of Chinese text, the 
sentence having an overlapping ambiguity string, the method comprising: 

with a processor; 

generating a first set of tokens uti Sizing a Forward Maximum Matching (FMM) 

segmentation of the sentence; 
generating a second set of tokens ut ilizing a Backward Maximum Matching 

(BMM) segmentation of the sentence; 

he secon d set of tokens to determine 
okens; 

a_#^K>verlapping ambiguity string-base*! 
jfaei^t a #eH-aBd4he-B-MM«egmeHtet4e^; 
d nstu i e\jcaj rcK in \\w \ si n_ 

retaining the constituent lexical words in the tokenized sentence and removing the 

overlapping ambiguity string from the tokenized sentence; 
obtaining probability information related to context based on at least one context 
feature surrounding the overlapping ambiguity string and « kh* i^s^'i* 



' pari ng the f n st set of tokens a 
common tokens and differing sets 
recognizing I « u 
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H ^ ^H*i-_- ^-wkh> the cj2iiiUtiie.nl jevicaS wokK wherein the at 
least one context feature comprises a Chinese character; and 
outputting an indication for selecting one of the FMM segmentation and the BMM 
segmentation as a function of obtained probability information. 

15. (previously presented) The method of claim 14 wherein outputting includes selecting one 
of the FMM segmentation of the overlapping ambiguity string and the BMM segmentation of the 
overlapping ambiguity string based on higher probability. 

16. (previously presented) The method of claim 15 wherein obtaining probability information 
comprises using an N-gram model. 

17. (previously presented) The method of claim 16 wherein obtaining probability information 
comprises obtaining probability information about a first word of the overlapping ambiguity 
string. 

18. (previously presented) The method of claim 16, wherein obtaining probability information 
comprises using probability information about a last word of the overlapping ambiguity string. 

19. (previously presented) The method of claim 16, wherein obtaining probability information 
comprises using the N-gram model that includes probability information for context words 
surrounding the overlapping ambiguity string. 

20. (previously presented) The method of claim 16, wherein using the N-gram model 
comprises using trigram probability information about a string of words comprising a first word 
of the overlapping ambiguity string and two context words to the left of the first word. 
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21. (previously presented) The method of claim 16, wherein using the N-gram model 
comprises using trigram probability information about a string of words comprising a last word 
of the overlapping ambiguity string and two context words to the right of the last word. 

22. (previously presented) The method of claim 14, wherein outputting includes using Naive 
Bayesian Classifiers. 

23. (previously presented) The method of claim 14, wherein obtaining probability information 
comprises obtaining trigram probability information and constructing an ensemble of Naive 
Bayesian Classifiers from the trigram probability information. 

24. (previously presented) The method of claim 23, wherein outputting an indication 
comprises identifying one of the FMM segmentation and the BMM segmentation based on 
probability calculated from the ensemble of Naive Bayesian Classifiers. 

25. (currently amended) A method of segmenting a sentence of Chinese text comprising: 

with a processor; 

, ■ i ning a sentence of Chinese characters into constituent ( 
having one oi mu»< ( bsne^e dui.icfoi b\ p erforming a Forward 
Maximum Matching (FMM) segmentation of the input sentence and a 
Backward VLixiuu;)!; Maichinj (liMMj se^iKMUt^p oi s_h< inpuj 
sentence; 

tokenizing the sentence of characters into known characters and at least one 

overlapping am biguity string; 
determining the constituent lexical words in the overlapping ambiguity string; 
i\ ' >..Ni<-fuiie>H lesiLoi void in the ipjkenj; cd -t ijienc \ i 

overlapping ambiguity string from the tokenized sentence; 
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receiving probability information related to context from an N-gram language 

model comprising probability information for theconstituent lexical words 
v On ^ s and context features surrounding the 

overlapping ambiguity string, wherein the context features comprise at 

least one Chinese character; 
resolving the overlapping ambiguity string based on the received probability 

information. 

26. (previously presented) The method of claim 25, wherein receiving probability information 
comprises receiving probability information from a trigram language model. 

27. (previously presented) The method of claim 25, and further comprising generating an 
ensemble of classifiers with the received probability information. 

28. (canceled) 

29. (currently amended) The method of claim £8-25 and further comprising generating an 
ensemble of classifiers as a function of the N-gram model. 

30. (previously presented) The method of claim 29 wherein generating the ensemble of 
classifiers includes approximating probabilities of the FMM and BMM segmentations of the 
overlapping ambiguity string as being equal to the product of individual unigram probabilities of 
individual words in the FMM and BMM segmentations of the overlapping ambiguity string. 

31. (previously presented) The method of claim 29, wherein generating the ensemble of 
classifiers includes approximating a joint probability of a set of context features conditioned on an 
existence of one of the segmentations of the overlapping ambiguity string based on a corresponding 
probability of a leftmost and a rightmost word of the corresponding overlapping ambiguity string. 



